{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# INFO8010: Homework 2\n",
    "\n",
    "In the previous homework, you learned how to program your first neural network starting from the very first principles of deep learning. If you managed to solve last assignment without any problems **congratulations!** If that was not the case **don't worry**, here's a second assignment for you which you can use to get better at deep learning.\n",
    "\n",
    "In this homework we will see some slighly more complicated deep learning concepts: we will start by taking a look at some of PyTorch's functionalities that are necessary for training deep networks efficiently. We will then train our first neural networks for tackling different image classification tasks, learn to build custom datasets and explore how to train a CNN.  \n",
    "\n",
    "The strucutre of the notebook is identical to the one of the previous homework. Similarly to last time, you have to submit the notebook **with your solutions** to the exercises. When you encounter a `# your code` comment, you have to write some code yourself and you have to discuss the code/results when you see the instruction\n",
    "\n",
    "> your discussion\n",
    "\n",
    "Without further ado let's start by importing the libraries we will need throughout this assignment!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt \n",
    "import numpy as np\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim\n",
    "import torch.utils.data as data\n",
    "\n",
    "from PIL import Image\n",
    "from torchvision import datasets, transforms, utils"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# As of 2022/02/23, the CIFAR10 dataset SSL certificate is outdated which prevents its download.\n",
    "# The following deactivates the verification of the SSL certificates, but\n",
    "# never reproduce this unless you absolutely trust the source.\n",
    "import ssl\n",
    "ssl._create_default_https_context = ssl._create_unverified_context"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Dataloaders\n",
    "\n",
    "Today's first concept are PyTorch's dataloaders. As you have seen during the theoretical lectures, one of the main ingredients for successfully training deep learning models is data, **lots of data**. \n",
    "\n",
    "As you can easily imagine, it is not possible to load datasets of millions of images into the memory of your machine. Furthermore, these images come in a form that does not make it possible to exploit the tensor operations we have seen in the previous assignment. \n",
    "\n",
    "To deal with these issues (and many more of them) we can use [dataloaders](https://pytorch.org/docs/stable/data.html), a data loading utility that allows us to deal with large datasets efficiently. In what follows, you are given your first example of dataloader which will use the popular [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "transform = transforms.Compose([transforms.ToTensor()])\n",
    "trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)\n",
    "testset = datasets.CIFAR10(root='./data', train=False, transform=transform)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's explain what we just did. Thanks to PyTorch's [torchvision](https://pytorch.org/vision/stable/index.html) sub-library, we just downloaded the CIFAR10 dataset on our machine. The dataset was stored in the `./data` folder and comes in two different forms thanks to the use of the `train` flag: a version that can be used as training set, and a version that can be used as testing set. These two datasets are subclasses of `torch`'s `data.Dataset` class. We will see later what this `data.Dataset` class consists in exactly. Torchvision also allows us to define a set of image transformations which we have defined at the beginning of this cell: in this case we would like to convert our images to tensors, see the [documentation](https://pytorch.org/vision/stable/transforms.html#torchvision.transforms.ToTensor) for an exact description of this transformation.\n",
    "\n",
    "Now that we have defined which dataset we would like to use, and the form in which we would like to have our images, we can create our first data loader. Data loaders are objects over which you can iterate and that load, transform and return mini-batches of inputs/targets at each iteration. The advantage of data loaders is that they (can) perform pre-processing of the data in parallel, i.e. in several concurrent worker pools.\n",
    "\n",
    "Here, we create two data loaders that return mini-batches of 4 elements at each iteration. When using stochastic gradient descent (SGD), the training data loader should shuffle the training dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "trainloader = data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)\n",
    "testloader = data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Before training anything, let's take a look at the images we just downloaded."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')\n",
    "\n",
    "def show_images(img):\n",
    "    plt.imshow(transforms.functional.to_pil_image(img))\n",
    "    plt.show()\n",
    "\n",
    "images, labels = next(iter(trainloader))\n",
    "show_images(utils.make_grid(images))\n",
    "print(*[classes[l] for l in labels])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `transforms` module comes also in as very handy for performing other type of data transformations: here's an example which transforms the CIFAR10 images into gray scaled images."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "transform = transforms.Compose([transforms.Grayscale(), transforms.ToTensor()])\n",
    "gray_scaled_trainset = datasets.CIFAR10(root='./data', train=True, transform=transform)\n",
    "gray_scaled_trainloader = data.DataLoader(gray_scaled_trainset, batch_size=4, shuffle=True, num_workers=2)\n",
    "\n",
    "images, labels = next(iter(gray_scaled_trainloader))\n",
    "show_images(utils.make_grid(images))\n",
    "print(*[classes[l] for l in labels])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 Transforms\n",
    "\n",
    "Al remembered from the theoretical lectures that one way to make neural networks converge faster is to **normalize** the pixel values. He wrote the following code snippet to normalize his training set, but he encountered an error."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "transform = transforms.Compose([\n",
    "    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n",
    "    transforms.ToTensor(),\n",
    "])\n",
    "bugged_trainset = datasets.CIFAR10(root='./data', train=True, transform=transform)\n",
    "bugged_trainloader = data.DataLoader(bugged_trainset, batch_size=4, shuffle=True, num_workers=2)\n",
    "\n",
    "images, labels = next(iter(bugged_trainloader))\n",
    "show_images(utils.make_grid(images))  # should look weird due to normalization\n",
    "print(*[classes[l] for l in labels])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fix his mistake."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Al also remembers that, with image datasets, a common practice to increase the robustness of neural networks is **data augmentation**. He wants to apply random flips (vertical and horizontal) and random color changes to his training set, but he does not know how to. Could you help him?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Running operations on a GPU\n",
    "\n",
    "As you may know, one important aspect of deep learning is that large models can be trained efficiently on specialized hardwares such as Graphical Processing Units (GPUs) or Tensorial Processing Units (TPUs). PyTorch allows you to perform operations on GPUs very easily by transferring the concerned models and/or tensors to GPUs.\n",
    "\n",
    "However, to do so, you need a CUDA compatible GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "torch.cuda.is_available()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If the result of the previous cell is `True`, everything is ready to run on the GPU and you can continue. Otherwise it means you do not have any GPU that is compatible with the `torch` version installed on your machine. In this case, we invite you to use [Google Colab](https://colab.research.google.com/) to do the rest of this homework. Do not forget to ask Colab for a GPU (in Runtime > Change runtime type > Hardware accelerator)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "device = 'cuda'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's compare the speed of tensor operations on GPU and CPU. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = torch.randn(1000, 100000)\n",
    "B = torch.randn(100000, 1)\n",
    "\n",
    "# on CPU\n",
    "%timeit A @ B"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "A = torch.randn((1000, 100000), device=device)\n",
    "B = torch.randn((100000, 1), device=device)\n",
    "\n",
    "# on GPU\n",
    "%timeit A @ B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead of directly creating a tensor on the GPU you may also transfer a model or a tensor on the GPU, for example we can transfer a simple MLP on the GPU and then back to the CPU as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create MLP on CPU\n",
    "mlp = nn.Sequential(\n",
    "    nn.Linear(3, 512),\n",
    "    nn.ReLU(), \n",
    "    nn.Linear(512, 512),\n",
    "    nn.ReLU(), \n",
    "    nn.Linear(512, 512),\n",
    "    nn.ReLU(),\n",
    "    nn.Linear(512, 1),\n",
    "    nn.Sigmoid(),\n",
    ")\n",
    "\n",
    "# forward pass on CPU\n",
    "x = torch.randn(256, 3)\n",
    "%timeit mlp(x)\n",
    "\n",
    "# transfer MLP to GPU (in-place)\n",
    "mlp.to(device)\n",
    "\n",
    "# forward pass on GPU\n",
    "x = x.to(device)\n",
    "%timeit mlp(x)\n",
    "\n",
    "# release the GPU memory\n",
    "mlp.to('cpu')\n",
    "x = x.to('cpu')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you may notice, computations are much faster on the GPU. However, data transfer between GPU and CPU (and vice-versa) is usually very slow. We recommend to reduce the transfers of data between GPU and CPU as much as possible. For example when you want to save your loss after each iteration, in order to avoid a memory leak, you should prefer doing `.detach()` rather than `.cpu()` or `.item()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.  Classifying the CIFAR10 dataset with an MLP\n",
    "\n",
    "Now that you know how to handle datasets, we are ready to properly train today's first deep learning model on the CIFAR10 dataset. Before we dive into it, **do not underestimate** the importance of properly pre-processing the data before training neural networks. This step is as important as defining the neural architectures themselves, but is very often overlooked.\n",
    "\n",
    "In this exercise you are provided with an already defined multi-layer perceptron that you can train to classify CIFAR10 images. The structure of the network is already defined, yet some crucial hyperparameters are missing. It is your job to fill them in and successfully train the network. As part of the exercise, you are also required to monitor the evolution of training: this usually consists in checking how the training and testing losses evolve during training and keeping track of the model's accuracy on the testing set. Report these statistics with some plots. In addition, transfer the network and the mini-batches on GPU to speed up training.\n",
    "\n",
    "Fill in the code below, discuss your choices and your results. Are you satisfied with the final accuracy?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_features =\n",
    "output_features = \n",
    "hidden_features = # your code\n",
    "learning_rate =\n",
    "num_epochs =\n",
    "\n",
    "class MLP(nn.Sequential):\n",
    "    def __init__(self, input_features, output_features, hidden_features):\n",
    "        super().__init__(\n",
    "            nn.Flatten(),\n",
    "            nn.Linear(input_features, hidden_features),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(hidden_features, hidden_features),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(hidden_features, hidden_features),\n",
    "            nn.ReLU(),\n",
    "            nn.Linear(hidden_features, output_features),\n",
    "        )\n",
    "    \n",
    "network = MLP(input_features, output_features, hidden_features)\n",
    "\n",
    "# your code\n",
    "\n",
    "criterion = nn.CrossEntropyLoss()\n",
    "optimizer = torch.optim.Adam(network.parameters(), lr=learning_rate)\n",
    "transform = transforms.Compose([\n",
    "    transforms.ToTensor(), \n",
    "    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n",
    "])\n",
    "\n",
    "trainset = datasets.CIFAR10(root='./data', train=True, transform=transform)\n",
    "testset = datasets.CIFAR10(root='./data', train=False, transform=transform)\n",
    "\n",
    "trainloader = torch.utils.data.DataLoader(trainset, batch_size=100, shuffle=True, num_workers=2)\n",
    "testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)\n",
    "\n",
    "def train(num_epochs):\n",
    "    train_avg_loss = []\n",
    "    test_avg_loss = []\n",
    "    test_accuracy = []\n",
    "\n",
    "    for i in range(num_epochs):\n",
    "        train_losses = []\n",
    "        test_losses = []\n",
    "        \n",
    "        for x, y in trainloader:\n",
    "            # your code\n",
    "            \n",
    "            pred = network(x)\n",
    "            loss = criterion(pred, y)\n",
    "            train_losses.append(loss.detach())\n",
    "            \n",
    "            optimizer.zero_grad()\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "\n",
    "        with torch.no_grad():   \n",
    "            correct = 0\n",
    "            \n",
    "            for x, y in testloader:\n",
    "                # your code\n",
    "                \n",
    "                pred = network(x)\n",
    "                loss = criterion(pred, y)\n",
    "                test_losses.append(loss)\n",
    "                \n",
    "                y_pred = pred.argmax(dim=-1)\n",
    "                correct = correct + (y_pred == y).sum()\n",
    "\n",
    "            accuracy = correct / len(testset)\n",
    "            \n",
    "        # your code\n",
    "\n",
    "    return # your code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_avg_loss, test_avg_loss, test_accuracy = train(num_epochs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Plot the statistics below and discuss your hyperparameter choices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> your discussion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.  Create a custom dataset\n",
    "\n",
    "Sometimes you would like to train a model on your own dataset, which will very likely not be part of `torchvision`. To overcome this you can create a custom dataset class which will handle the data for you. This can be done by inheriting from `torch`'s `data.Dataset` class and defining the methods `__len__` and `__getitem__` (see the [documentation](https://pytorch.org/docs/stable/data.htm)).\n",
    "\n",
    "In this exercise your goal is to program a custom dataset class which you will later use for training a CNN. We will use the Kaggle Cats and Dogs dataset which you can download from [here](https://www.microsoft.com/en-us/download/details.aspx?id=54765). Note that some images may have different shapes. It is up to you to deal with this elegantly. In addition, some images may be corrupted. You can simply remove those. \n",
    "\n",
    "When programming a custom dataset class, you have to start by defining the constructor, which will get as input the location of your dataset, whether the images that will be returned will serve for training or testing, and some other potential attributes. For this exercise we will be using 20000 images for training and 5000 images for testing. For the `__getitem__` function you may find the `PIL.Image.open` useful. Do not forget to transform the images into tensors and return the image labels as well ($0$ or $1$)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code\n",
    "\n",
    "class CatAndDogsDataset(data.Dataset):\n",
    "    def __init__(self, root_dir, train=True):\n",
    "        \"\"\"Initializes a dataset containing images and labels.\"\"\"\n",
    "        super().__init__()\n",
    "        # your code\n",
    "\n",
    "    def __len__(self):\n",
    "        \"\"\"Returns the size of the dataset.\"\"\"\n",
    "        # your code\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        \"\"\"Returns the index-th data item of the dataset.\"\"\"\n",
    "        # your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us have a quick look at these samples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_dataset = CatAndDogsDataset('kagglecatsanddogs_5340/PetImages/', train=True)\n",
    "my_loader = data.DataLoader(my_dataset, batch_size=4, shuffle=True, num_workers=2)\n",
    "\n",
    "# your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Classifying the Cats and Dogs dataset with a CNN\n",
    "\n",
    "As we have seen in class, classifying images with a multi-layer perceptron isn't really a good idea. Convolutional Neural Networks (CNN) are in fact a much better option for this task. It is now your job to create your custom CNN and train it on the Cats and Dogs Dataset.\n",
    "\n",
    "Similarly to what you have done when classifying the CIFAR10 dataset you are again required to report and discuss the performance of your model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> your discussion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Feedback\n",
    "\n",
    "Now that you are done with this final deep-learning assignment here are some final questions about the exercises you were required to solve:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span style=\"color:blue\">How much time did you spend on this homework?</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span style=\"color:blue\">Do you feel confortable with what it means to define a neural network and train it?</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span style=\"color:blue\">Do you think you now have enough preliminary knowledge for successfully starting to work on your course final project?</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span style=\"color:blue\">If you had to go through the two homeworks again, is there something you would have liked to explore more or explained more into detail?</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
